Why differencing works to detrend a time series
Photo by Nick Chong on Unsplash

In the realm of time-series analysis, detecting and removing trends is crucial for accurate modeling and forecasting. This blog explores why differencing works effectively in detrending time-series data, offering insights into its mechanics and practical implications. You can look at my IPython notebook for this blog here.

Loading in the data

For this showcase, I’ll be using the Air Passenger data.

import pandas as pd

air_passengers = pd.read_csv('./AirPassengers.csv', index_col='Month', parse_dates=True)
air_passengers.head()

Output:


Passengers
Month 
1949-01-01 112
1949-02-01 118
1949-03-01 132
1949-04-01 129
1949-05-01 121

Checking for missing values

air_passengers.info()

Output:

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 144 entries, 1949-01-01 to 1960-12-01
Data columns (total 1 columns):
 #   Column      Non-Null Count  Dtype
---  ------      --------------  -----
 0   Passengers  144 non-null    int64
dtypes: int64(1)
memory usage: 2.2 KB

There are a total of 144 entries and 144 of them are non-null, so there are no missing values.

Plotting the data

plt.figure(figsize=(10, 6))
plt.plot(air_passengers)
plt.title('Monthly International Airline Passengers')
plt.xlabel('Date')
plt.ylabel('Number of Passengers')
plt.show()

Understanding the trend

Now, obviously there is a positive trend in the time-series. But we’re not sure if it is linear or something more complex. It looks mostly linear so we can start working with that. To begin, we will perform a linear regression on the series and plot both of them on the same graph.

Performing linear regression

# Create a time index feature
air_passengers['TimeIndex'] = np.arange(len(air_passengers))

# Split into features (X) and target (y)
X = air_passengers[['TimeIndex']] # We put two brackets because the library takes in a 2d array
y = air_passengers['Passengers']

# Initialize the linear regression model
model = LinearRegression()

# Fit the model to the data
model.fit(X, y)

# Predict the values
predictions = model.predict(X)

Plotting the linear

plt.plot(air_passengers.index, y, label='Original Data')

# Plot the linear regression line
plt.plot(air_passengers.index, predictions, color='red', label='Linear Regression Line')

plt.title('Monthly International Airline Passengers with Linear Trend')
plt.xlabel('Date')
plt.ylabel('Number of Passengers')
plt.legend()
plt.show()

The line seems to be mostly in the center of the data, but at the start it is way off. So it most definitely isn’t a linear trend. But for now, let’s just continue with a linear de-trending. We can de-trend a linear time-series by subtracting the original data from the linear trend.

Why does this detrending work?

Suppose that the series has a linear trend. We can model it as follows:

Where α is the y-interest, β is the slope of the graph and ϵ is the random noise for time t.

Our linear regression attempts to find the values for α and β. So, our graph of the linearly fitted line would be:

p for “predicted”

Now, if we find the difference between both of them:

So, we’ve removed all elements of linearity from the graph, and the only thing that remains is the random noise.

Plotting the de-trended graph

detrended = y - predictions

# Plot the detrended series
plt.figure(figsize=(10, 6))
plt.plot(air_passengers.index, detrended, label='Detrended Series')
plt.title('Detrended Airline Passenger Data (Subtracted Linear Trend)')
plt.xlabel('Date')
plt.ylabel('Detrended Value')
plt.legend()
plt.show()

Unfortunately, you can see there is still some pattern within the residuals. This means that the trend was not linear afterall and we need a more complicated de-trending function.

De-trending this graph with a quadratic trend

We perform polynomial regression with degree 2 on the graph

# Import the necessary library for generating polynomial features
from sklearn.preprocessing import PolynomialFeatures

# Create a PolynomialFeatures object with degree 2
poly = PolynomialFeatures(degree=2)

# Transform the input features X into polynomial features of degree 2
X_poly = poly.fit_transform(X)

# Perform linear regression on the polynomial features
poly_model = LinearRegression()

poly_model.fit(X_poly, y)

poly_trend = poly_model.predict(X_poly)

# Plot the original data and the polynomial regression
plt.figure(figsize=(10, 6))
plt.plot(air_passengers.index, y, label='Original Data')
plt.plot(air_passengers.index, poly_trend, color='red', label='Polynomial Trend')
plt.title('Polynomial Trend in Airline Passenger Data')
plt.xlabel('Date')
plt.ylabel('Number of Passengers')
plt.legend()
plt.show()

Now this seems to be more accurate. We can perform the same de-trending as before:

poly_detrended = y - poly_trend
plt.figure(figsize=(10, 6))
plt.plot(air_passengers.index, poly_detrended, label='Detrended Series (Polynomial Trend Removed)', color='orange')
plt.title('Detrended Airline Passenger Data (Polynomial Trend Removed)')
plt.xlabel('Date')
plt.ylabel('Detrended Value')
plt.legend()
plt.show()

The reasoning why this works is pretty much the same as with the linear series, only that the equations will change.

Differencing

Why do we need differencing?

As you saw, there was a lot of trial-and-error in trying to figure out the trend by hand. And we’re not even sure if this is the right trend. What if there is a slight trend in our seemingly random residuals. We can perform the Ljung-box test to rigorously see if there is any pattern in the residuals. But, if there was a simpler way of detrending any trend, our lives would be much easier. And there is: Differencing. Differencing is simply subtracting every data point by its previous neighbor, as given by the equation:

If this doesn’t immediately detrend the graph, we can perform the same function again and again.

And so on

Until our series has been completely detrended.

Performing differencing on our graph

Differencing our graph is extremely simply since pandas already provides us with the functionality

differenced = y.diff().dropna()

The dropna() part is because you can’t perform differencing on the first element, since it has no previous element. So that operation just returns a NA .

Plotting the differenced graph:

# Plot the differenced series
plt.figure(figsize=(10, 6))
plt.plot(differenced, label='First Order Differenced Series')
plt.title('Detrended Airline Passenger Data (First Order Differencing)')
plt.xlabel('Date')
plt.ylabel('Differenced Value')
plt.legend()
plt.show()

You can clearly see that there is no trend in the residuals, so we’ve successfully detrended the graph in just a single line. Again, we can perform the Ljung-box test to officially verify this.

Why does differencing work?

For linear trends:

Suppose our time-series had only the linear trend. So each data point can be modeled with:

Our differencing function looks like:

Substituting the linear equation for each of the terms in our differencing equation:

So, we’ve removed any “t” term from the equation. This means that all we have left is a constant β and two random noise variables. We have successfully detrended the graph.

For polynomial trends:

Polynomial trends are a bit more complicated. Each data point is modeled with a polynomial function and an random noise term:

If we perform differencing once. i.e first-order differencing, we get:

Substituting the equations for each of the terms in our differencing equation:

Simplifying:

Each term in this is difference of powers, we can simplify each of the terms:

If we simplify the first term, all we get is α

If we simplify the second term, we get:

And so on.

So, differencing reduces the degree of the polynomial by 1. We can keep performing differencing again and again till we’ve removed all the polynomial elements from our trend.

Conclusion

In conclusion, differencing proves to be a valuable technique in the arsenal of time-series analysts. By effectively removing trends, it enhances our ability to uncover underlying patterns and make reliable forecasts. Mastering this method not only improves the accuracy of predictive models but also deepens our understanding of the dynamics hidden within time-series data.